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Executive  Summary 


Research  was  conducted  to  evaluate  the  feasibility  of  applying  Wave¬ 
lets  and  Wavelet  IVansform  methods  to  transient  signal  feature  ex¬ 
traction  problems.  Wavelet  transform  techniques  were  developed  to 
extract  low  dimensional  feature  data  that  allowed  a  simple  classifi¬ 
cation  scheme  to  easily  separate  the  various  signals  of  interest.  Ad¬ 
ditional  development  of  these  techniques  will  lead  to  robust  feature 
extraction  methods  for  transient  signals. 

Detailed  study  results  are  presented  in  section  3.3  on  page  53. 
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Chapter  1 

Introduction  and  Overview 


This  final  report  details  the  results  of  research  conducted  into  the  feasibil¬ 
ity  of  applying  Wavelets  and  Wavelet  Transform  methods  to  transient  signal 
feature  extraction  problems.  This  report  contains  both  theoretical  and  ex¬ 
perimental  results  of  studies  conducted  with  mechanical  treinsient  data.  This 
work  was  performed  under  contract  number  F49620-91-C-0089  for  the  Air 
Force  Office  of  Scientific  Research  (AFOSR.) 

Aware,  Inc.  was  tasked  to  perform  the  following  activities: 

1.  Selection  of  Wavelet  Basis  Functions  and  TVansform  Topolo¬ 
gies.  Define  and  develop  wavelet  transforms  which  will  provide  time- 
frequency  feature  localization  methods  for  the  types  of  mechanical  tran- 
rient  signals  provided  for  anadysis; 

2.  Analysis  and  Characterization  of  Signal  Features.  Measure  the 
transient  feature  extraction  characteristics  of  wavelet- based  signal  pro¬ 
cessing  algorithms; 

3.  Prototype  Algorithm  Development.  Select  candidate  detection 
and  claissification  features  and  develop  a  prototype  algorithm  for  that 
automatic  detection  and  claissification  of  the  transient  signals  of  inter¬ 
est. 
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Compactly  Supported  Wavelets 

The  reason  for  conducting  this  study  is  the  remarkable  results  being  ob¬ 
tained  through  the  application  of  compactly  supported^  wavelets  to  transient 
signal  processing.  Compactly  supported  wavelets  are  a  class  of  mathematicaJ 
functions  that  were  discovered  in  1986.  Some  of  the  particular  advantages  of 
wavelet  signal  processing  methods  are: 

•  Wavelet  transforms  are  computationally  efficient.  The  number  of  arith¬ 
metic  operations  required  to  perform  a  wavelet  tramsform  is  linearly 
proportional  to  the  number  of  input  data  points.  The  computational 
complexity  of  the  more  traditional  Fast  Fourier  Transform  (FFT)  is 
proportional  to  the  number  of  input  data  points  times  the  logarithm 
(base  2)  of  the  number  of  input  data  points  (0(iVlog2  N)).  For  large 
problems,  the  wavelet  methods  require  only  a  fraction  of  the  number 
of  operations  required  by  the  traditional  methods.  This  advantage  in¬ 
creases  as  the  problem  size  increases.  In  addition,  wavelet  transform 
algorithms  can  be  directly  implemented  in  Very  Large  Scale  Integration 
(VLSI)  logic  devices,  and  they  are  fully  parallelizable. 

•  Wavelet  transform  methods  can  analyze  signals  in  both  the  time  and 
frequency  domains.  The  relative  resolution  of  the  time  and  frequency 
components  can  be  flexibly  adapted  to  the  problem  at  hand.  The 
selection  of  the  appropriate  time-frequency  resolution  can  be  done  up¬ 
front  at  system  design  time  or  it  can  be  accomphshed  with  reed-time 
adaptive  algorithms.  The  traditional  Fourier  transform  suffers  from 
very  poor  (or  nonexistent)  time  resolution.  This  peurticularly  limits 
its  usefulness  in  the  analysis  of  time-limited  (i.e.,  transient)  signals. 
There  have  been  attempts  to  modify  the  Fourier  technique  in  various 
ways  to  overcome  this  limitation,  but  all  of  the  methods  introduce 
some  additional  complexities  and  compromises.  Wavelet  methods  offer 
a  very  natural  meams  to  perform  time- frequency  signal  analysis. 

•  Wavelets  provide  the  flexibility  to  choose  a  peu-ticular  wavelet  function 
that  is  “customized”  to  the  specific  application.  This  is  possible  since 

^Compactly  supported  means  that  the  functions  are  identically  zero  outside  a  finite 
interval. 
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compactly  supported  wavelets  are  an  infinite  family  of  complete  or¬ 
thogonal  basis  functions.  This  flexibility  to  choose  basis  functions  can 
not  be  matched  with  the  Fourier  transform  for  it  uses  only  a  single  set 
of  basis  functions  -  the  complex  exponentials  (i.e.,  the  sine  and  cosine 
functions.) 

Report  Overview 

This  report  begins  with  an  introduction  to  wavelet  phase  space.  This  is 
the  “playing  field”  on  which  we  develop  the  concepts  of  wavelet  signal  pro¬ 
cessing.  The  development  is  intuitive  with  the  rigorous  mathematical  details 
provided  in  the  appendices.  The  basic  concepts  of  wavelet  signal  analysis  are 
developed  and  contrasted  to  traditional  Fourier  analysis  methods.  This  is 
followed  by  a  discussion  of  the  computational  characteristics  and  complexity 
of  wavelet  methods.  Next  the  general  transient  signal  processing  problem  is 
introduced  and  the  applicability  of  wavelet  signal  processing  methods  to  var¬ 
ious  parts  of  the  problem  is  discussed.  This  is  followed  by  a  discussion  of  the 
results  obtained  with  the  specific  set  of  signals  which  were  studied.  Excellent 
signal  separation  was  demonstrated  with  the  use  of  low  dimensional  wavelet 
extracted  features  in  a  simple  classifier  design. 

To  assist  the  reader  in  locating  the  particular  sections  of  this  report  which 
address  the  contractually  required  items,  the  following  guide  is  provided: 

1.  The  selection  of  wavelet  basis  functions  and  transform  topologies  is 
discussed  in  section  3.2.  Wavelet  treinsforms  are  defined  and  developed 
on  an  intuitive  level  in  chapter  2.  A  rigorous  mathematical  development 
of  wavelets  and  wavelet  transforms  is  presented  in  appendix  B; 

2.  The  analysis  and  characterization  of  signal  features  extracted  via  wave¬ 
let  methods  as  well  as  the  design  of  prototype  algorithms  axe  presented 
in  section  3.3. 


Chapter  2 

Wavelet  Signal  Processing 


This  chapter  introduces  the  concepts  of  signal,  signal  processing,  inat>'emat- 
ical  transforms,  phase  space  representation  of  signals,  the  wavelet  transform 
and  wavelet  signal  processing  methods.  The  mathematical  details  of  wavelets 
and  wavelet  transforms  are  rigorously  discussed  in  appendix  B.  The  devel¬ 
opment  here  is  intuitive  with  the  emphasis  on  explaining  “what”  a  wavelet 
transform  does  rather  than  “how”  it  works  or  “why”  it  is  mathematically 
correct.  The  Fourier  transform  will  be  introduced  because  it  is  the  signal 
processing  technique  most  frequently  used  today.  The  objective  is  to  provide 
an  overview  of  the  capabilities  of  wavelet  transform  methods  and  an  under¬ 
standing  of  their  relative  strengths  and  weaknesses  compared  with  Fourier 
methods. 


2.1  Signals  and  Signal  Processing 

A  signal  is  something  that  conveys  information.  In  this  discussion,  signals 
will  usually  consist  of  a  series  of  measurements  of  some  physical  quantity  such 
as  voltage  or  force.  The  information  conveyed  is  often  about  the  occurrence 
of  events  or  the  state  and  behavior  of  a  physical  system. 

The  discussion  will  be  limited  to  the  clatss  of  signals  that  could  be  repre¬ 
sented  in  a  digital  computer.  These  must  be  discrete  signals,  i.e.,  signals  that 
are  represented  by  a  sequence  of  numbers  whose  values  are  of  finite  precision 
i.e.,  representable  by  a  finite  number  of  bits.  The  signals  may  be  samples  of 
a  continuous  quantity  (i.e.,  measurements  of  the  value  of  a  physical  quantity 
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with  a  sensor)  or  values  from  a  completely  discrete  process  (e.g.,  counting 
the  occurrences  of  some  event.)  Due  to  their  discrete  nature,  these  signals 
contain  finite  energy.  The  energy  of  a  discrete  signal  is  calculated  by  sum¬ 
ming  the  squares  of  its  values.  Let  s*,  A:  =  (0, 1, 2  . . .)  represent  the  sequence 
of  values  of  a  discrete  signal.  The  energy  of  the  signal  E  is 

e^Y.4-  (2-1) 

k 

Signal  Processing  is  a  collection  of  methods  used  to  change  the  represen¬ 
tation  of  a  signal  into  a  form  that  allows  the  information  to  be  interpreted 
more  easily.  Signal  processing  is  used  to  locate  events  or  identify  system 
modes  or  other  characteristic  behaviors.  Signal  processing  methods  are  used 
to  separate  signals  from  noise  or  separate  signals  from  each  other. 

Noise  is  extraneous  energy  that  is  combined  with  a  signal.  It  interferes 
with  the  ability  to  interpret  the  information  being  conveyed  by  the  signal. 
Noise  has  many  sources,  both  natural  and  man-made.  It  can  have  charac¬ 
teristics  that  are  either  random  or  deterministic  or  a  combination  of  both. 
Whether  energy  represents  noise  or  signal  sometimes  depends  on  the  inter¬ 
ests  of  the  investigator.  For  example,  in  speech  signal  processing,  speaker- 
dependent  characteristics  of  the  acoustic  speech  waveform  are  “noise”  if  it  is 
desired  to  extract  the  meaning  of  the  utterance,  but  the  semantic  character¬ 
istics  are  “noise”  if  the  problem  is  to  identify  the  speaker. 

A  simple  example  of  a  signal  processing  operation  is  calculating  the  av¬ 
erage  or  mean  of  a  sequence  of  values.  A  signal,  consisting  of  a  sequence  of 
‘noisy’  temperature  measurements,  can  be  “signal  processed”  by  averaging  a 
large  number  of  samples  to  separate  the  steady  temperature  estimate  from 
the  randomly  fluctuating  noise. 

Signal  processing  methods  vary  in  the  amount  of  computation  required  to 
perform  them.  We  measure  the  computational  complexity  of  various  meth¬ 
ods  by  establishing  a  relationship  between  the  number  of  input  data  points 
to  be  processed  and  the  number  of  arithmetic  calculations  required  to  per¬ 
form  the  method.  In  the  previous  example  of  the  averaging  operation,  the 
complexity  is  of  linear  order  because  the  number  of  operations  required  to 
perform  the  averaging  method  is  simply  proportional  to  the  number  of  data 
points  to  be  averaged.  This  is  written  zis  0(n)  (the  operation  is  said  to  have 
a  computational  complexity  of  order  n.)  A  method  that  required  a  number 
of  arithmetic  operations  proportional  to  the  square  of  the  number  of  input 
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points  would  be  of  quadratic  order  and  written  as  O(n^).  This  notation  is 
useful  for  comparing  the  relative  amount  of  processing  required  by  different 
methods  for  large  amounts  of  input  data.  It  represents  the  “trend”  or  asymp¬ 
totic  limit  for  large  problems.  The  comparison  of  computational  complexity 
for  a  particular  small  processing  problem  usually  requires  a  more  detailed 
analysis  than  simply  examining  the  “trend”  for  large  problem  sizes. 


2.2  An  Introduction  to  Transforms,  Basis 
Functions  and  Phase  Space  Representa¬ 
tions 

A  transformation  (or  transform)  is  a  mathematical  process  which  changes 
the  representation  of  a  signal.  Transforms  axe  used  in  signal  processing  to 
extract  information  and  separate  signals  from  noise.  They  are  described 
by  properties  that  characterize  their  actions.  The  mathematical  theory  of 
transformations  includes  both  continuous  and  discrete  transforms.  We  will 
only  employ  discrete  transforms  since  our  signals  are  discrete.  Several  im¬ 
portant  transform  properties  will  be  defined  to  provide  a  vocabulary  for  the 
discussion  of  transformation  techniques. 

Transformations  are  called  invertible  if  they  are  reversible.  The  process 
(or  transformation)  that  reverses  the  action  of  a  particular  transformation  is 
C2jled  the  inverse  of  the  transformation. 

Energy-preserving  transformations  conserve  signal  energy.  The  formula 
for  computing  the  energy  contained  in  a  signal  was  defined  in  the  previous 
section  by  equation  (2.1).  An  energy-preserving  transformation  may  change 
the  representation  of  a  signal  dramaticaUy,  but  the  energy  of  the  input  signal 
and  the  energy  of  the  transformed  output  will  always  be  equal.  Non-energy¬ 
preserving  transformations  either  lose  energy  or  produce  extra  energy  in  the 
transformation  process. 

A  special  type  of  transformation  is  called  a  linear  transformation.  For  a 
linear  tremsformation,  the  results  of  transforming  two  signals  and  then  adding 
the  (transformed)  signals  together  is  exactly  the  same  as  adding  the  two 
signals  together  first  and  then  transforming  the  sum  (of  the  two  signals). 
The  results  of  signal  processing  methods  that  use  hnear  transformations  are 
easier  to  analyze  than  nonlinear  methods  because  a  conunon  type  of  noise 
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(typically  found  in  electronic  environments)  is  additive  and  in  this  case  the 
action  on  the  signal  can  be  separated  from  the  action  on  the  noise  since  the 
two  actions  do  not  depend  on  each  other. 

IVansformations  may  be  transforms  or  stream  processing  transforms. 
Block  transform  methods  process  the  input  data  in  “blocks”  (or  groups  of 
a  fixed  size.)  The  output  is  also  a  “block”,  not  necessarily  the  same  size  as 
the  input  block.  The  choice  of  block  size  affects  the  ability  to  resolve  signal 
features  larger  tham  a  block  or  smaller  than  a  fraction  of  the  block  size. 
Block  methods  have  difficulties  resolving  a  signal  feature  that  is  split  across 
two  adjacent  input  blocks.  We  note  that  block  transform  algorithms  have 
been  designed  to  process  a  continuous  stream  of  input  data.  These  methods 
implicitly  divide  the  input  data  into  segments  and  the  effects  of  blocking  are 
ad  ways  present  in  the  output. 

Streaun  processing  transforms  generate  one  or  more  output  streams  from 
am  input  stream.  Such  a  method  can  be  visuahzed  ats  a  “pipe”,  taking  in 
data  at  one  end  and  producing  output  at  the  other  end.  These  methods  can 
process  blocks  of  data,  which  axe  just  “streams”  with  a  beginning  amd  end. 

To  illustrate  these  properties  we  return  to  the  averaging  operation  dis¬ 
cussed  in  the  previous  section.  The  “averaging”  transformation  is  a  discrete 
transformation  because  it  takes  a  discrete  sequence  of  numbers  and  converts 
it  into  a  discrete  value  (which  might  have  to  be  rounded  to  a  given  preci¬ 
sion.)  Averaging  is  not  invertible  as  mamy  different  sequences  of  input  values 
could  produce  the  same  average  value  and  there  is  no  way  of  reversing  the 
operation  to  obtain  the  correct  input  sequence.  The  transformation  is  not 
energy-preserving  because  the  sum  of  the  squares  of  the  inputs  does  not  equal 
the  square  of  the  average  value.  The  transform  is  a  linear  transformation  be¬ 
cause  the  sum  of  the  averages  of  two  sequences  does  equal  the  average  of  the 
sum  (element  by  element!  )  of  the  input  sequences.  This  transformation  can 
be  implemented  as  either  a  block  or  stream  process.  Data  could  be  processed 
in  fixed  blocks,  producing  one  average  value  for  each  input  block  or  it  could 
be  used  to  generate  a  “moving  average”,  producing  a  series  of  outputs  that 
represent  the  average  value  of  a  “window”  that  is  moved  across  the  input 
data. 

The  behavior  of  a  system  can  often  be  imderstood  in  terms  of  modes  or 
characteristic  ways  that  a  system  operates  or  performs.  A  system  may  have 
a  few  or  a  very  large  number  of  distinct  behaviors  that  can  occur  individually 
or  in  combinations.  Consider  the  sound  that  a  piano  creates.  It  can  produce 
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single  notes,  combinations  of  notes  or  no  notes  at  all.  The  notes  can  be 
played  loudly  or  softly  as  well  as  rapidly  or  slowly.  They  could  be  separated 
in  time  or  overlapping  each  other. 

In  this  example  the  characteristic  “modes”  are  the  set  of  complicated 
vibration  patterns  produced  by  piano  strings  when  they  are  struck.  There  is 
one  “characteristic  mode”  per  key,  for  a  total  of  88  “modes.”’  One  “analysis” 
of  piano  sound  seeks  to  determine  that  keys  were  pressed  at  what  time  and 
how  hard  they  were  pressed.  We  are  going  to  define  a  “piano  transform”  to 
perform  the  “signal  processing”  part  of  this  analysis  procedure.  The  input 
to  our  “piano  transform”  is  (a  stream  of)  piano  sounds  and  the  output  is 
88  streams  of  numbers  that  represent  how  loud  a  “mode”  (or  key)  is  as  a 
function  of  time.  Zero  values  mean  that  a  key  is  not  pressed  (said  another 
way,  the  amplitude  of  the  “mode”  is  zero.)  Non-zero  values  represent  how 
loud  a  “mode”  is  during  some  interval.  The  loudness  of  a  note  is  the  amplitude 
of  the  mode.  This  information  could  be  displayed  as  88  graphs,  each  stacked 
up  above  the  other,  tracing  out  how  hard  each  key  was  pressed  over  time. 

We  are  going  to  perform  the  “piano  transform”  on  a  computer  with  the 
following  algorithm: 

1 .  Create  a  mathematical  description  of  the  vibrating  piano  string  for  each 
“mode”  (key)  and  store  it  for  future  reference; 

2.  Read  in  a  section^  of  the  signal  and  compare  each  stored  reference 
(from  the  step  above)  to  this  section  of  the  signal.  Compute  a  match 
(or  correlation)  score  that  is  proportional  to  how  well  the  signal  matches 
the  reference. 

3.  Output  the  match  score  for  each  “mode”; 

4.  Read  in  another  section  of  signal  and  repeat  the  previous  two  steps. 

This  principle  of  comparing  a  known  reference  to  a  signal  is  conceptually 
how  a  transform  method  works.  The  mathematical  representations  of  the 
modes  (the  references)  are  called  basis  functions.  The  matching  scores  or 
correlations  are  called  the  transform  coefficients  (or  coordinates)  of  a  signal 
with  respect  to  the  basis  functions.  Basis  functions  are  selected  to  model  the 
modes  or  behavior(s)  of  the  system  being  analyzed. 


^We  want  to  stress  the  concept  that  more  than  a  single  point  must  be  considered. 
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A  collection  of  basis  functions  is  called  a  basis.  If  the  basis  has  enough 
functions  to  represent  all  of  the  possible  signals  of  a  system,  it  is  called  a 
complete  basis.  Furthermore,  if  each  the  basis  functions  represents  infor¬ 
mation  that  is  independent  and  uncorrelated,  the  basis  functions  are  called 
orthogonal.  Orthogonality  is  important  and  desirable  because  orthogonal 
basis  functions  have  no  redundancy  in  their  transform  representation. 

Returning  to  the  piano  example,  the  basis  functions  (representations  of 
the  vibrating  strings)  form  a  complete  and  orthogonal  basis  for  piano  sounds. 
These  basis  functions  have  a  sca/c®  relationship  between  them.  There  is  a 
constant  ratio*  between  the  frequencies  of  adjacent  modes  (keys.)  Notes 
that  are  separated  by  an  octave  have  a  ratio  of  2  :  1  of  their  frequencies. 
The  strings  in  a  piano  vary  in  length  depending  on  the  frequency  of  the  note 
they  produce.  Low  notes  require  long  strings  and  high  notes  are  produced  by 
short  strings.  In  fact,  the  lengths  of  the  strings  have  the  same  ratios  between 
them  as  the  frequencies. 

We  have  discussed  the  length  of  the  strings  because  we  are  going  to  make 
the  length  of  the  “piano”  basis  functions  vary  the  same  way  as  the  string 
lengths.  This  will  affect  the  way  we  compute  the  “piano  transform.” 

Imagine  that  we  plotted  all  the  “piano”  basis  functions  at  the  same  scale. 
The  functions  for  the  low  notes  are  much  longer  than  the  functions  for  the 
high  notes.  This  is  because  low  note  “modes”  vibrate  slower  and  have  their 
sound  waves  spread  across  more  signal  values  than  high  note  “modes.”  To 
make  this  a  little  more  concrete,  let  us  say  that  a  low  note  basis  function  has 
length  100  and  a  high  note  basis  function  has  length  25  (These  notes  differ 
by  two  octaves  since  100  =  2^  x  25.)  K  200  samples  of  piano  sound  are  input 
to  the  piano  transform  for  processing,  only  two  sub- sections  of  the  signal  can 
be  compared  against  the  length  100  low  note  functions  while  8  sub-sections 
of  signal  can  be  compared  to  the  high  note  function  of  length  25.  The  long 
functions  (which  represent  phenomena  that  vary  slowly)  have  a  coarse  time 
resolution  that  is  matched  to  the  slow  rate  of  variation.  The  short  func¬ 
tions  have  correspondingly  finer  time  resolution  matched  to  the  faster  rate 
of  vibration.  This  relationship  between  time  resolution  and  how  rapidly  in¬ 
formation  changes  is  central  to  understanding  the  wavelet  transform.  Signal 

^This  definition  of  orthogonality  is  perhaps  oversimplified,  but  will  suffice  for  this 
example. 

^Do  not  confuse  this  with  a  musical  scale,  though  the  ideas  are  closely  related. 

^In  this  example,  the  ratio  is  :  1  i.e.,  12  notes  span  an  octave. 
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features  that  vary  over  small  scales  (short  distances  or  short  time  intervals) 
can  be  located  precisely  in  time  while  features  that  vary  over  large  scales  can 
only  be  located  with  a  correspondingly  coarser  time  resolution. 

Compactly  supported  wavelets  are  a  complete  and  orthogonal  set  of  basis 
functions  for  the  set  of  all  finite  energy  discrete  signals.  The  wavelet  trans¬ 
form  is  invertible,  energy-preserving  and  hnear.  The  wavelet  transform  is  a 
stream  processing  method  that  analyzes  both  continuous  streams  of  input 
data  and  blocks  of  data.  A  (multiplier  2)  wavelet  basis  consists  of  a  scal¬ 
ing  function,  a  basic  wavelet  and  a  collection  of  smaller  wavelets  at  reduced 
scales.  The  smaller  wavelets  are  created  by  “shrinking”  the  basic  wavelet  by 
factors  of  2  and  shifting  (or  translating)  them  by  scaled  integer  distances. 
Thus  the  collections  of  smaller  wavelets  are  1/2  and  1/4  amd  1/8  (and  so 
on)  the  size  of  the  basic  wavelet.  Wavelet  basis  functions  are  all  related  by 
multiples  of  the  constant  ratio  (2  :  1).  The  basic  wavelet  is  computed  from 
the  scahng  function.  The  selection  of  the  scaling  function  determines  all  the 
remaining  basis  functions.  A  remarkable  fact  is  that  there  are  an  infinite 
number  of  scaling  functions,  each  of  which  defines  a  complete  wavelet  ba¬ 
sis.  This  provides  tremendous  flexibility  in  selecting  basis  functions  that  are 
appropriate  for  different  systems. 

The  time  resolution  of  each  wavelet  is  proportional  to  its  length  or  dura¬ 
tion,  smaller  wavelets  having  finer  time  resolution.  Conversely,  as  the  length 
of  the  wavelet  increases,  its  resolution  in  scale  (or  frequency)  gets  smaller. 
The  trade-off  between  resolution  in  time  and  resolution  in  sceile  is  discussed 
in  detail  in  the  next  two  sections  and  will  not  be  developed  further  in  this 
section. 

In  general,  the  computational  complexity  of  wavelet  methods  is  0{n). 
The  efficiency  is  the  direct  result  of  the  simplicity  of  the  wavelet  transform 
process.  The  process  starts  by  separating  the  signal  information  at  the  small¬ 
est  scale  from  information  at  all  the  larger  scales.  The  output  of  the  first  stage 
is  processed  again  by  the  same  method  and  is  repeated  for  each  successive 
scale.  This  recursive  structure  reduces  the  amount  of  data  to  be  processed  at 
each  successive  level  by  a  factor  of  two  that  reduces  the  computational  cost 
for  each  successive  transform  level.  Information  which  varies  rapidly  over 
just  a  few  data  points  is  separated  from  information  that  varies  over  many 
points.  The  procedure  is  stopped  at  the  largest  scale  of  interest. 

The  outputs  of  the  wavelet  transform  are  coefficients  that  represent  the 
similarity  of  the  signal  (as  a  function  of  time)  to  the  wavelets  at  different 
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scsJes.  The  output  of  a  wavelet  transform  can  be  plotted  on  a  grid  that 
has  time  on  the  x-axis  (or  t-20cis)  and  scale  on  the  y-axis.  This  grid  is  called 
phase  space  and  is  used  to  graphically  display  the  relationships  between  signal 
information  at  diflPerent  scales. 

We  have  used  the  piano  example  to  introduce  the  basic  ideas  required 
to  intuitively  understand  how  transform  methods  work  and  what  a  wavelet 
transform  does.  We  will  now  discuss  the  Fourier  transform  and  illustrate 
how  it  is  different  from  the  wavelet  transform. 

The  Fourier  transform  is  invertible,  energy-preserving  and  linear.  The 
Fourier  basis  functions  are  orthogonal  and  complete  for  the  entire  set  of 
finite  energy  discrete  signals. 

The  Fourier  transform  separates  signal  information  by  frequency.  The 
Fourier  transform  is  a  block  transform  which  requires  an  a  priori  choice 
of  block  size.  The  discrete  Fourier  basis  functions  are  uniformly  sampled 
constant  frequency  sine  and  cosine  functions  each  of  which  persists  as  long 
as  the  block  size.  The  basis  functions  are  uniformly  spaced  in  frequency 
from  zero  to  one-half  the  block  size.  The  frequencies  are  separated  by  a 
constant  interval  rather  than  a  constant  ratio.  Since  all  the  basis  functions 
are  as  long  as  the  input  data  block,  they  all  have  the  same  (lack  of)  time 
resolution.  The  output  of  a  Fourier  transform  contains  information  about 
how  the  energy  in  the  signal  is  distributed  among  the  frequencies  in  the  signal. 
However  information  about  how  the  energy  is  distributed  in  time,  about  when 
it  occurred,  is  not  available  in  the  Fourier  transform  representation.  All  that 
can  be  inferred  is  that  the  frequency  was  present  somewhere  in  the  block 
and  for  what  fraction  of  the  signal  energy  it  accounted.  The  computational 
complexity  of  the  Fast  Fourier  Transform  (for  the  commonly  used  Cooley- 
Tukey  algorithm)  is  0(n  logj  n). 

In  summary,  the  primary  difference  between  the  Fourier  transform  and 
the  wavelet  transform  is  in  how  each  separates  signal  information  between 
time  and  scale®. 

The  Fourier  transform  is  a  block  transform  that  separates  signal  infor¬ 
mation  into  uniformly  spaced  frequency  components.  The  Fourier  transform 
has  fine  frequency  resolution  and  a  complete  lack  of  time  resolution.  Larger 

®Scale  is  related  to  wavelength  which  is  defined  as  1/frequency  for  periodic  functions. 
For  periodic  signals,  there  is  a  close  correspondence  between  scale  and  wavelength  and 
therefore,  frequency.  The  behavior  of  aperiodic  signals  can  be  analyzed  in  terms  of  large 
and  small  scale  variations  that  are  not  periodic  and  do  not  have  a  well  defined  wavelength. 
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block  sizes  increase  the  range  of  frequencies  that  the  Fourier  method  can 
resolve,  but  further  decreases  the  time  information  available  from  a  signal. 

Wavelet  transforms  separate  signal  information  by  scale  and  time.  There 
are  an  infinite  number  of  wavelet  basis  from  which  an  appropriate  basis  can  be 
selected.  The  transform  information  is  defined  by  a  constant  scaling  relation¬ 
ship  with  the  time  resolution  proportional  to  the  scale.  The  computational 
complexity  of  the  wavelet  transform  is  0(n)  less  than  the  Fourier  transform 
0(n  logju). 

The  next  two  sections  will  refine  a  few  of  the  mathematical  concepts 
related  to  wavelet  basis  functions  and  the  mechanics  of  the  wavelet  transform 
process.  At  the  expense  of  a  few  det2uls  and  mathematical  precision,  the 
reader  may  skip  to  section  2.3.3  or  even  2.4  without  loss  of  continuity. 


2.3  The  Scaling  Function,  Wavelets,  and  the 
Wavelet  Transform 

Wavelet  methods  separate  the  components  of  a  signal  by  scale.  Here  scale 
means  a  level  of  detail  or  resolution.  Its  significance  can  be  expressed  in  terms 
of  a  “correlation  length” ,  or  the  length  over  which  data  at  a  given  scale  tends 
to  vary  significantly.  Scale  and  frequency  are  logically  independent  concepts. 
In  many  circumstances  there  is  a  relationship  between  scale  and  frequency 
that  arises  from  the  specific  details  of  the  signal  processing  problem.  In  these 
cases,  instead  of  referring  to  low  frequency  and  high  frequency,  one  refers  to 
large  scale  and  small  scale.  The  representation  of  a  sign2d  in  terms  of  scale 
-  the  wavelet  representation  -  separates  the  signed  into  components  that  are 
independent  and  uncorrelated  (i.e.,  orthogonal),  yet  each  heis  a  well  defined 
scede-specific  level  of  detedl. 

With  a  wavelet  transform,  both  the  time  scale  resolution,  or  “correla¬ 
tion  length”,  and  the  frequency  resolution  vary  logarithmically.  Wavelet 
techniques  divide  the  spectrum  of  a  signal  into  equal  width  bands  on  a  log- 
frequency  scale;  they  provide  an  octave  band  decomposition  of  the  signal. 
This  scale-oriented  approach  provides  for  finer  frequency  resolution  in  low 
frequency,  large  scale  b2Uids,  and  lesser  frequency  resolution  at  high  frequen¬ 
cies  and  small  scales.  Correspondingly,  the  wavelet  decomposition  provides 
for  coarse  temporal  resolution  at  low  frequencies  (since  changes  happen  slowly 


Aware,  Inc. 


18 


at  low  frequencies),  and  fine  temporal  resolution  at  high  frequencies,  where 
rapid  changes  may  occur. 

By  focusing  on  scale  resolution  rather  than  on  frequency  or  time  alone, 
the  wavelet  technique  considers  the  reciprocal  relationship  between  time  and 
frequency  (or  any  type  of  structure  that  is  expressed  across  multiple  data 
points).  To  identify  or  locate  the  position  of  a  particular  shape,  such  as 
an  oscillation,  in  a  set  of  data,  one  must  look  for  relationships  among  the 
data  values;  a  structure  or  an  oscillation  exists  only  across  a  set  of  data, 
and  not  in  a  single  point  value.  A  number  has  no  frequency,  no  structure. 
Conversely,  properties  that  exist  only  across  a  set  of  data  cannot  be  said 
to  have  a  particular  location  within  that  set.  Wavelet  signal  representation 
techniques  take  this  trade-off  into  account  by  allowing  small  sets  of  data  to 
be  combined  and  correlated  to  derive  structured  or  shape  information  about 
that  subset,  without  requiring  a  complete  transformation  of  the  signal  into  a 
particular  type  of  structural  information,  the  way  a  Fourier  transform  does. 
One  is  allowed  to  exchange  a  small  amount  of  temporal  resolution  for  a  smaU 
amount  of  scale  information,  a  pay-as-you-go  system. 

It  is  no  accident  that  these  properties  are  reflected  in  the  characteristics  of 
many  natural  signals.  Signals  with  time  varying  characteristics,  like  speech, 
music,  seismic  signals  and  underwater  acoustic  signals  are  all  best  analyzed  by 
a  system  capable  of  resolving  both  frequency  and  time.  Furthermore,  many 
signal-producing  phenomena  have  octave  band  structure  due  to  the  presence 
of  harmonics  within  the  signal.  Transient  events  also  respond  well  to  multiple 
scale  analysis  in  that  the  identification  of  precisely  located  phenomena,  such 
as  the  sharp  onset  of  a  sign2d,  requires  the  ability  to  resolve  its  location  in 
time  at  a  very  fine  scade,  while  the  characteristics  of  later,  more  persistent, 
parts  of  the  signal  may  require  the  ability  to  identify  larger  scale  structures. 

2.3.1  The  Scaling  Function 

The  scaling  function  is  at  the  core  of  any  wavelet  based  representation  of  a 
signal.  We  will  discuss  only  compactly  supported  wavelets  in  this  report.  The 
scaling  function  has  three  essential  properties.  The  first  is  that  it  is  compactly 
supported.  This  means  that  the  scaling  function  is  exactly  zero  outside  a 
bounded  region  of  the  real  line.  The  scaling  function  is  only  locally  non-zero. 
The  second  essential  property  is  that  the  scaling  function  is  orthogonal  to 
integer  translates  of  itself.  The  importance  of  this  will  become  clear  a  little 
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later.  The  third  property  is  that  the  scaling  function  is  intimately  related  to 
smaller  scaled  versions  of  itself.  This  relationship  is  expressed  concisely  by 
the  scaling  equation: 


N-l 

<p(x)  =  £  akipi2x  -  k)  (2.2) 

k=0 

where  (p(x)  is  the  scaling  function.  The  function  ip(2x)  is  a  smaller,  scaled 
down  (by  a  factor  of  two),  version  of  ip{x).  The  scaling  equation  states  that 
<p{x)  is  equal  to  a  weighted  sum  of  these  smaller  shifted  versions  of  itself. 
The  numbers  of  which  only  finitely  many  {N,  which  is  an  even  number 
here)  are  non-zero,  are  called  the  scaling  coefficients.  N  is  the  size  of  the 
wavelet  system.  The  support  of  9,  the  region  on  which  it  is  non-zero,  is 
the  interval  [0,N  —  1].  The  coefficients  a*  must  satisfy  certain  conditions  in 
order  for  the  scaling  fimction  to  exist  and  satisfy  the  scaling  equation.  There 
turn  out  to  be  an  infinite  number  of  sets  of  scaling  coefficients  for  every  even 
N  >  2.  It  is  the  choice  of  the  a*,  from  among  this  set,  which  determines  the 
detailed  shape  of  <p{x).  A  great  variation  exists  in  the  possible  functions 
as  illustrated  by  the  examples  shown  in  figures  2.1  and  2.2. 

Scaling  functions  may  be  selected  from  the  class  of  Daubechies  functions, 
which  have  several  important  characteristics.  They  are  relatively  smooth 
and  have  certain  approximation  properties  (i.e.,  vanishing  moments).  These 
systems  will  be  referred  to  as  ZJ2,  Z?3,  DA  . . .  Dg,  where  g  is  the  genus  of 
the  system. 

The  scaling  function  is  the  basic  unit  from  which  a  level  of  detail  is 
constructed.  This  is  done  by  considering  the  set  of  functions  that  can  be 
represented  as  a  linear  combination  of  shifted  versions  of  the  scaling  function. 
That  is,  we  define  a  collection  of  functions  at  scale  level  j,  which  we  write 
Vj,  to  be  the  set  of  functions  that  are  linear  combinations  of  the  functions 
<p{2^x  —  k),  where  is  an  integer.  The  factor  2^  multiplying  x  has  the  effect 
of  shrinking  the  support  of  tp  to  {N  —  l)/2^,  and  the  shift  by  k  moves  these 
small  functions  around  by  a  fixed  fraction  of  their  support,  independent  of 
scale.  Thus  a  function’s  components  at  scale  level  j  are  expressed  by  the 
equation: 

fii^)  =  E  c,,fe'^(2^x  -  k) 

fc€Z 

where  fj  is  the  part  of  /  resolvable  at  the  level  j. 


(2.3) 
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This  idea  can  also  be  expressed  by  stating  that  Vj  is  the  space  spanned 
by  the  set 

{v3(2^x  -k)\keZ.}  (2.4) 

This  set  of  functions  forms  an  ortkonormal  basis  for  Vj.  Functions  in  Vj  are 
uniquely  expressible  as  linear  combinations  of  the  basis  functions,  and  the 
basis  functions  all  have  unit  “energy”.  Thus  the  set  of  functions  {(p{2^x  —  k)} 
form  an  orthogonal  set  of  “templates”  for  the  scale  level  Vj. 

The  effect  of  performing  a  transform  with  such  a  set  of  basis  functions 
is  to  identify,  within  the  signal,  those  parts  or  components  that  are  similar 
to  the  basis  functions  at  the  given  scale  level.  Similarly,  a  Fourier  transform 
has  oscillatory  functions  as  a  basis,  and  identifies  the  relative  contribution  of 
each  frequency  to  the  overall  signal. 

With  shifted  versions  of  the  scaling  function  as  a  basis,  the  Wavelet  trans¬ 
form  will  identify  components  that  are  similar  to  a  particular  shifted  copy 
of  the  scaling  function;  that  is,  a  representation  of  a  function  (in  the  scaling 
function  basis)  can  identify  features  locally  in  time,  since  the  scaling  function 
is  compactly  supported,  and  locally  in  scale,  because  the  scaling  function  has 
structure. 

There  is  another,  equally  good,  set  of  orthogonal  “templates”  for  this 
scale  level  Vj.  This  set  of  “templates”  gives  a  different  type  of  information 
them  the  one  presented  above.  In  our  previous  basis,  edl  the  resolution  within 
the  scale  level  j  was  in  the  temporal  domain:  each  coordinate  corresponded 
to  a  position  in  time.  The  new  basis  will  trade  some  of  this  temporal  res¬ 
olution  for  some  additional  structural  information;  it  will  give  us  two  sets 
of  coefficients,  one  set  that  represents  large  scale  structure,  while  the  other 
set  represents  small  (“fine”)  scale  structure.  The  process  extracts  or  filters 
out  the  components  of  the  scale  level  j  that  cannot  be  regarded  as  part  of 
the  coarser  scale  level,  j  —  1-  These  are  taken  to  represent  the  information 
at  scale  j  that  is  small  scale  compared  to  that  part  of  Vj  which  is  actually 
lairge  scale  information  embedded  in  the  small  sc£ile  space.  This  operation 
captures  the  large  scale  features  of  a  function  fj  in  the  scale  level  j  —  I,  and 
retains  the  small  scale  features  in  a  difference  space. 
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2.3.2  Wavelets 

This  idea  of  dividing  the  scale  level  Vj  into  a  coarser  version  of  itself,  Vj-i, 
and  a  difference  space,  which  we  will  call  can  be  compactly  expressed 

as  an  orthogonal  splitting  of  the  space  Vj  into  two  perpendicular  spaces  Vj_i 
and  Wj_i; 

=  (2.5) 

The  reason  this  can  be  done  efficiently  comes  from  the  scaling  equation. 

Since  ip{x)  can  be  expressed  as  a  linear  combination  of  translated  versions  of 
(^(2x),  the  coarser  scale  level  Vj-i  is  contained  within  the  finer  scale  level  Vj". 

Vj.t  C  Vj  (2.6) 

Repetition  of  this  argument  shows  that 

V,.,cV,cVi^,CVj+,...  (2.7) 

The  difference  space,  VVj_i,  contains  all  that  remains  when  the  coarser  scale 

information  is  removed.  However,  since  Wj_i  is  contained  in  Vj,  it  is  also 
expressible  as  a  linear  combination  of  translates  of  ^(2^ a:). 

The  actual  set  of  functions  that  are  used  to  span  the  space  VVj_i  are  the 
orthonormal  basis  formed  by  the  functions 

rPiVx)  =  -  k)  (2.8) 

k=0 


or  in  the  case  of  j  =  0, 


V’(x)  =  -  fc).  (2.9) 

fc=0 

Notice  that  the  signs  now  alternate  in  the  sum,  and  the  order  of  the  coef¬ 
ficients  Ofc  has  been  reversed  {k  N  —  k  —  1).  These  changes  make  0(x) 
orthogonal  to  <^(i).  The  full  basis  for  Wj  is  formed  by  taking  shifted  versions 
of  tp,  i.e.,  Basis(fVj)  =  {2^^^xp{2^x  —  k)  \  k  an  integer  }.  The  support  of  rp{x) 
is  easily  seen  to  be  the  same  as  the  support  of  (p{x),  and  this  is  true  for  the 
shrunken  versions  as  well,  that  is,  the  support  of  ^{2^x  —  k)  is  the  same  as  the 
support  of  v?(2^z  —  k).  The  normalization  term  2^^^  maintains  unit  energy 
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in  the  functions.  Figures  2.3  and  2.4  are  the  wavelets  that  correspond  to  the 
scaling  functions  presented  in  figures  2.1  and  2.2. 

Thus  the  transformation  from  the  first  representation  of  Vj,  where  fj  was 
expressed  as  a  linear  combination  of  shifted  versions  of  <p{2^x),  to  the  new 
representation,  where  fj  is  expressed  as  a  sum  of  translates  of  <p{2^~^x)  and 
translates  of  x/){2^~^x),  gives  us  new  information  about  the  shapes  and  struc¬ 
tures,  perhaps  frequencies,  present  in  fj.  This  is  at  the  expense  of  some 
temporal  resolution,  because  the  new  basis  functions  are  twice  as  long.  The 
new  basis  incorporates  the  inter-relationships  among  larger  subsets  of  the 
data,  providing  correlative  information.  As  a  result  of  the  spectral  refine¬ 
ment,  there  has  been  a  loss  of  temporal  resolution. 

The  scaling  function,  being  the  basis  for  all  the  spaces  Vj,  forms  the 
connection  between  these  spaces  via  the  scaling  equation.  The  basic  wavelet, 
tj},  represents  the  differences  between  the  scale  levels. 

2.3.3  The  Wavelet  Transform 

The  exchange  of  temporal  resolution  in  Vj  for  scale  resolution  in  the  division 
of  Vj  into  Vj-i  and  Wj-i  forms  the  basic  unit  of  the  Wavelet  transform. 
Since  the  definition  is  independent  of  scale,  it  can  be  repeatedly  applied  in 
the  same  way.  Furthermore,  the  operations  involved  in  the  transformation 
require  only  the  expansion  coefficients  of  the  function  f{x)  in  the  basis  at  the 
current  scale,  and  not  the  actual  values  of  the  function.  The  computation  is 
very  simple  and  efficient  because  of  the  close  fink  between  the  functions  (p 
and  ip. 

The  basic  operation  involved  in  a  Wavelet  transform  is  the  conversion  of 
temporal  resolution  into  structural  or  spectral  information.  This  basic  single 
step,  essentially  a  filter,  exchanges  half  the  temporal  resolution  of  a  signal 
for  twice  the  frequency  resolution;  the  product  of  the  two  remains  the  same®. 
More  importantly,  this  operation  can  be  repeated  to  gain  any  desired  level 
of  detciil  in  the  structural  or  spectral  realm,  while  only  imposing  a  recip¬ 
rocal  loss  of  resolution  in  the  temporal  domain.  This  is  in  sharp  contrast 
to  Fourier  transform  techniques,  where  one  either  gets  all  the  available  fre¬ 
quency  information,  or  none  of  it,  with  no  intermediate  stages  of  knowledge 
available. 


^This  is  a  case  of  the  Heisenberg  Uncertainty  Principle. 
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The  wavelet  transform  allows  one  to  move  gradually  between  the  two  ex¬ 
tremes  present  in  the  Fourier  transform,  successively  gaining  shape  or  struc¬ 
ture  information.  In  a  sense,  the  wavelet  transform  interpolates  between  the 
frequency,  or  structure,  domain  and  the  temporal  domain.  This  step  by  step 
transformation  can  be  understood  in  terms  of  trade-offs  between  relative  time 
resolution  and  relative  frequency  resolution,  but  its  features  are  most  simply 
understood  by  viewing  them  in  phase  space. 

Phase  space  is  a  two  dimensional  plane  in  which  we  place  frequency  (or 
more  generally,  some  distinguishing  structural  property)  on  the  y-axis  and 
time  on  the  x-axis.  Any  time  or  frequency  based  transform  will  be  chairacter- 
ized  by  its  division  or  partition  of  phase  space  into  “resolution  cells”.  Each 
resolution  cell  corresponds  to  a  single  basis  function,  and  thus  represents  a 
specific  occurrence.  The  shape  of  the  resolution  cell  reflects  the  characteris¬ 
tics  of  the  basis  function,  its  width  in  time  representing  the  support  or  extent 
of  the  basis  function,  and  the  width  in  frequency  representing  its  bandwidth. 

If  we  consider  the  original  signal,  we  know  its  values  at  each  point  in 
time  while  we  know  nothing  about  its  frequency  characteristics.  The  reso¬ 
lution  cells  are  vertical  strips,  each  representing  the  fine  temporal  resolution 
and  the  complete  lack  of  frequency  information.  When  the  signal  is  Fourier 
transformed,  exactly  the  opposite  happens.  Now  the  frequency  content  of  the 
signal  is  known  perfectly,  but  all  knowledge  of  location  in  time  has  been  lost. 
Now  the  resolution  cells  are  horizontal  strips  of  width  equal  to  l/2iV,  each 
reflecting  the  magnitude  of  a  single  frequency,  while  carrying  no  information 
about  when  that  frequency  was  present. 

In  general,  the  area  of  a  resolution  cell  cannot  be  decreased,  but  its  shape 
may  change.  If  we  assume  that  these  cells  remun  rectangular,  then  a  cell 
may  be  made  narrow  in  one  direction,  but  only  if  it  is  simultaneously  widened 
in  the  other  direction.  This  is  exactly  what  happens  when  one  applies  the 
basic  wavelet  sc2Je  separation  operation.  One  exchanges  a  factor  of  two 
in  the  width  of  the  resolution  cells  for  a  factor  of  one  half  in  their  height. 
This  operation  is  displayed  in  figure  2.5.  The  wavelet  transform  provides 
a  “zoom-lens”  approach  to  signal  representation.  Any  degree  of  structural 
information  can  be  acquired,  but  only  at  the  expense  of  enlarging  the  “field 
of  view”  temporally.  Conversely,  one  may  zoom  in  on  a  particular  temporal 
scale,  but  only  at  the  expense  of  global  structural  information. 

Since  the  transform  is  defined  in  terms  of  operations  on  the  coefficients 
of  the  representation,  and  not  the  actual  values  of  the  scaling  or  wavelet 
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Input— 

^  Small  Scale  or  High-Pass 

Figure  2.5:  Conceptual  Scale  Separation  Operation  of  Wavelet  Transform. 


functions,  the  output  from  a  single  stage  of  the  transform  is  exactly  what 
the  next  stage  requires  for  input.  This  easily  pipelined,  recursive  structure 
is  what  makes  the  wavelet  transform  rapidly  computable.  While  many  such 
structures  are  made  possible  by  the  wavelet  transform,  one  in  particular, 
the  one-sided  or  Mallat  transform,  has  proven  to  be  exceptionally  useful  in 
analyzing  signals.  Figure  2.6  is  a  conceptual  diagram  of  a  3  level,  one-sided 
wavelet  transform.  The  large  scale  (9?)  output  coefficients  from  the  first  stage 
are  the  input  for  the  second  stage,  etc.  The  number  of  data  points  at  each 
level  is  reduced  by  a  factor  of  two  by  the  conversion  of  temporal  information 
into  spectral  information.  Figure  2.7  illustrates  a  four  level,  one-sided  wavelet 
trjmsform.  We  have  included  plots  of  the  Daubechies-5  basis  functions  for 
this  transformation.  Figure  2.8  is  a  plot  of  the  spectral  characteristics  of 
the  wavelet  transform  shown  in  figure  2.7.  Notice  the  broad  high  frequency 
response  characteristics  of  the  smallest  scale  wavelets  which  corresponds  to 
their  fine  temporal  resolution.  The  large  scale  wavelets  have  just  the  opposite 
characteristics,  sharp  spectral  response  and  coarse  temporal  resolution. 

The  method  outlined  for  the  one-sided  transform  method  can  be  general¬ 
ized  to  include  any  “binary  tree”  structure  desired.  The  basic  transformation 
unit  is  repeatedly  applied,  taking  the  outputs  of  one  level  as  the  inputs  for 
the  next  level. 
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Figure  2.6:  Conceptual  3  Level,  One-sided  Wavelet  Transform. 


2.4  Wavelet  Signal  Processing  Methods 

Wavelet  signal  processing  methods  use  the  properties  of  wavelet  transforms 
to  extract  signal  information  about  events,  features,  modes  and  other  char¬ 
acteristic  behaviors.  There  is  no  single  wavelet  transform  of  a  signal,  but  a 
family  of  them,  all  equivalent  to  the  original  signal  in  information  content. 

The  choice  of  wavelet  technique  will  depend  on  the  specific  signal  pro¬ 
cessing  requirements.  Transient  detection  and  harmonic  analysis  often  call 
for  octave  band  decomposition  that  extracts  the  harmonic  signatures  of  the 
signals,  preserves  fine  temporal  resolution  and  is  computationally  very  effi¬ 
cient.  The  choice  of  wavelet  basis  functions  can  have  a  significant  effect  on 
the  ability  to  separate  signals  of  a  particular  type  firom  noise  or  other  inter¬ 
ference.  The  technique  is  flexible  in  that  additional  spectral  resolution  can 
be  obtained  if  required. 

A  slight  modification  of  this  method  can  be  applied  to  remove  noise  from 
signals.  Signal  features  that  exist  on  a  small  number  of  scales  can  be  isolated 
by  performing  a  wavelet  transform  and  discarding  coefficients  at  all  the  scales 
except  the  “target”  scales  and  then  inverting  the  transform  to  reconstruct 
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Figure  2.7:  Four  Level,  One-sided  Wavelet  Transform  with  Characteristic 
Functions  for  the  case  of  Daubechies-5. 


Response 
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Frequency  (Fraction  of  Full  Bandwidth) 


Figure  2.8:  Spectral  Characteristics  for  the  Four  Level,  One-sided  Wavelet 
IVansfonn  in  figxire  2.7.  Note  the  narrow  low-frequency  response  of  the  long 
scaling  function  and  the  broad  frequency  response  of  the  shortest  wavelet. 
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the  signal  with  the  noise  removed.  The  adaptive  selection  of  coefficients  is 
possible  based  on  fixed  or  floating  thresholds. 

The  wavelet  transform  is  lossless  at  each  level  and  forms  a  hierarchy  of 
successively  finer  spectral  refinements.  The  signal  energy  at  any  node  in 
a  wavelet  filter  bank  can  be  further  resolved  in  firequency,  but  will  never 
migrate  into  an  adjacent  sub- band.  This  allows  the  signal  energy  in  each 
node  to  be  evaluated  in  real  time  and  guide  the  development  of  the  filter 
bank  structure.  Low  energy  nodes  will  only  generate  sub-nodes  with  even 
less  energy.  The  ability  to  “prune”  the  processing  tree  significantly  reduces 
the  computational  cost  of  the  transform  method. 

The  structure  and  computational  complexity  of  several  wavelet  signal 
processing  methods  are  discussed  in  the  next  chapter. 


2.5  The  Computational  Complexity  of  Wave 
let  Transforms 

This  section  is  concerned  with  the  computational  efficiency  of  wavelet-based 
analysis  techniques.  The  computational  complexity  of  several  types  of  wave¬ 
let  transforms  is  developed  and  comparisons  are  made  to  the  computational 
complexity  of  the  FFT.  The  main  results  are: 

•  Evaluation  of  wavelets  and  scaling  functions  has  0{n)  computational 
complexity. 

•  Wavelet  calculations  are  parallelizable  and  can  be  pipelined.  Wavelet 
tr2Lnsforms  cam  be  calculated  in  0{n)  operations  and  in  O(log  n)  time 
if  concurrent  computation  is  employed. 

•  Wavelet  adgorithms  depend  parametrically  on  the  scaling  coefficients, 
so  adgorithms  cam  be  embodied  in  programs  that  are  structurally  inde¬ 
pendent  of  the  system  of  scading  coefficients. 

•  Wavelet  stream  processing  techniques  can  run  in  real  time  at  full  res¬ 
olution.  The  calculation  density,  i.e.,  the  number  of  operations  per 
datum,  does  not  grow  logarithmicadly,  as  it  does  with  an  FFT,  or  re¬ 
quire  au-tificial  frauning,  segmenting  or  windowing  to  render  real-time 
operation  feaisible. 
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•  Certain  classes  of  wavelet  coefficient  systems  can  be  exactly  computed 
in  a  digital  computer  without  roundoff  error. 

We  will  discuss  both  block  and  stream  processing  applications.  Compar¬ 
isons  with  the  FFT  are  natural  here,  since  the  FFT  is  a  commonly  used,  well 
understood,  and  highly  optimized  algorithm.  It  is  a  convenient  benchmark 
for  comparison.  We  ignore  pre-  and  post-processing  requirements  assuming 
that  the  output  from  the  wavelet  transform  is  the  desired  result. 

For  the  case  of  block  processing,  the  input  block  size  is  the  factor  that 
determines  the  computational  cost.  The  first  case  is  a  wavelet  transform  that 
resolves  only  a  portion  R  =  1/2*^  of  the  total  bandwidth,  and  resolves  it  as 
finely  as  possible  using  recursive  wavelet  techniques.  The  simplest  example 
of  such  a  transform  is  the  familiar  Mallat  Transform,  which  calculates  the 
decomposition  of  a  signal  on  the  basis  of  scale,  i.e.,  it  “homes  in”  on  low 
frequencies.  This  fundamental  structure  requires 

OPS{Mallat)  =  a(K)N{l  —  1/2"^)  #  of  operations  (2.10) 

where  a{K)  =  (K  +  1)  multiplies  and  (K)  additions  per  output  point.  The 
number  of  input  data  is  N,  and  J  is  the  finest  level  calculated;  that  is,  the 
basic  wavelet  decomposition  operation  is  applied  J  times.  The  wavelet  trans¬ 
form  can  be  implemented  as  a  finite  impulse  response  (FIR)  filter  with  the 
number  of  “taps”  equal  to  the  number  of  nonzero  wavelet  coefficients.  The 
number  of  nonzero  coefficients  is  K,  which  we  have  also  called  the  length  of 
the  wavelet  coefficient  matrix.  Note  that  whatever  the  depth  of  the  decom¬ 
position,  the  operation  count  never  exceeds  a{K)N.  This  operation  count 
also  applies  to  any  wavelet  transform  that  “zooms  in”  on  a  single  location 
in  phase  space,  allowing  other  side-bands  to  remain  unchanged.  These  axe 
not  partial  wavelet  transforms,  eaich  is  a  complete  representation  in  a  wave¬ 
let  basis.  Each  is,  however,  a  partial  frequency  decomposition,  but  that  is  a 
powerful  advantage  because  only  the  minimum  amount  of  computation 
is  performed. 

In  applications  where  one  wishes  to  resolve  frequencies  (or  some  other 
structure)  to  some  pre-specified  resolution  (say  1/2'^),  and  one  is  interested 
in  a  small  number  of  sub-bands,  wavelets  are  very  computationally  efficient. 
This  case  includes  the  MzJlat  Transform,  and  any  other  wavelet  processing 
scheme  that  generates  only  a  small  subset  of  the  finest  resolution  cells. 
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For  the  case  of  stream  processing,  the  comparison  with  the  FFT  is  no 
longer  appropriate.  An  FFT  simply  cannot  provide  the  pipelined  operation 
of  a  wavelet  transform.  Even  a  windowed  or  Short  Time  Fourier  Transform 
can  only  resolve  successive  segments  or  fr2unes  of  the  data,  and  within  each 
frame,  it  must  perform  a  full  FFT  with  the  corresponding  lack  of  flexibility 
in  the  resolution.  On  the  other  hand,  for  any  fixed  filter  structure  (fixed  tree) 
wavelet  transform  technique  that  is  designed  to  generate  a  specific  decompo¬ 
sition  in  frequency  which  has  a  finest  resolution  level  J,  the  entire  transform 
can  be  implemented  as  a  pipeline,  and  always  has  0{n)  complexity.  The 
coefficient  multiplying  N  in  the  number  of  operations  required  is  determined 
by  how  many  subbands  are  expanded  and  the  particular  level  of  detail.  The 
complexity  is  bounded  ,b  .ve  by  0{NJ),  and  therefore  for  a  stream  process¬ 
ing  application,  represents  a  constant  processing  delay  for  the  calculation  of 
a  full  wavelet  transform. 


Chapter  3 

Wavelet  Transient  Signal 
Processing 


This  chapter  begins  with  a  description  of  the  trainsient  signals  used  for  this 
study.  This  is  followed  by  a  description  of  the  signal  processing  algorithms 
used  for  signal  detection,  feature  extraction  and  classification.  The  final 
section  summarizes  the  results  of  the  signal  processing  experiments  that  were 
conducted  during  the  course  of  this  project. 


3.1  Transient  Signal  Data 

The  transient  signal  data  used  in  this  consisted  of  twenty-four  records  of 
digitally  sampled  signals.  These  records  contained  the  transient  of  interest 
as  well  as  interference.  The  interference  was  a  mixture  of  white  Gaussian 
noise,  quantization  noise,  power  supply/instrumentation  noise  and  bursts  of 
neaurly  constant  frequency  sinusoids.  The  sampling  frequency  was  well  above 
the  Nyquist  rate  thus  preserving  fine  signed  structure. 

Figure  3.1  is  a  typical  signal  record.  It  contains  strong  interference  com¬ 
ponents  in  the  first  half  of  the  record.  The  transient  signal  of  interest  is 
located  slightly  after  (to  the  right  of)  the  center.  The  final  third  of  the 
record  contains  only  noise.  Figures  3.2  through  3.25  are  plots  of  the  signals 
considered  in  this  study. 

The  signals  are  numbered  1  through  24.  They  are  also  identified  by  a 
source  file  identifier. 
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Figure  3.1:  Typical  signal  and  environment. 

The  signals  were  grouped  into  seven  “truth”  groups  based  on  apriori 
knowledge  of  the  source  of  each  signal.  Figures  3.26  through  3.32  are  stacked 
plots  of  the  transient  signals  (as  segmented  by  the  detection  process)  in  each 
group.  All  the  signal  groups  seem  reasonable  except  that  signal  number  10 
in  group  2  seems  to  be  mis-assigned. 
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Figxire  3.11:  Signal  niunber  10,  (mmmSmmmd). 
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Figure  3.32:  Group  7,  (Signals  22-23). 


3.2  Wavelet  Basis  Functions  and  Transform 
Topologies 

This  study  considered  rank  2  MsJlat  type  wavelet  transforms  as  well  as  rank 
3  and  4  uniform  type  wavelet  transforms.  The  signals  considered  in  this 
project  had  relatively  wide  bandwidth  which  suggested  an  octave  (Mallat) 
type  transform.  The  Mallat  type  transformations  proved  to  be  superior  early 
in  the  study  and  work  on  higher  rank  transforms  was  terminated  after  a  few 
weeks. 

The  source  signals  were  relatively  smooth  functions  embedded  in  near 
Gaussian  noise.  We  selected  the  Daubechies  wavelets  due  to  their  opti¬ 
mal  smoothness/vanishing  moment  characteristics.  Wavelets  &om  genus  2 
through  genus  10  were  tested  on  the  project  data.  The  shorter  systems  (gen¬ 
era  2  and  3),  seemed  to  be  too  susceptible  to  noise  in  the  data.  Genus  4 
wavelets  had  good  time  resolution  without  sensitivity  to  noise.  Genera  5  and 
greater  tended  to  smear  details  due  to  the  length  of  the  filters. 

Daubechies  genus  4  wavelets  were  selected  for  this  study.  This  wavelet 
system  was  used  in  a  6-level  Mallat  type  transform  topology. 
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3.3  Signal  Processing  Algorithms 

This  section  presents  a  description  of  the  prototype  algorithms  and  the  results 
of  signal  processing  experiments  conducted  by  Aware  on  the  transient  data 
to  investigate  the  capabilities  of  wavelet  based  feature  extraction  methods. 
Of  particular  interest  was  the  demonstration  of  the  feasibility  of  performing 
feature  extraction  with  a  wavelet  based  approach  and  the  robustness  of  the 
method  with  respect  to  natural  signal  variations  and  noise. 

3.3.1  Detection  Algorithm 

All  the  transients  in  this  study  have  strong  cross-band  time-frequency  fea¬ 
tures  that  were  exploited  to  detect  the  signals.  In  addition,  most  of  the 
signals  deviate  from  zero  for  a  relatively  long  period  of  time.  This  character¬ 
istic  was  used  early  in  the  study  to  facilitate  detection,  but  was  abamdoned 
later  in  the  study  when  the  full  set  of  signals  were  considered.  There  were 
two  signal  detection  algorithms  used  in  the  course  of  this  study: 

1.  Zero-crossing  method  with  local  energy  detector; 

2.  Cross-band  wavelet  method  with  peak-to-rms  detector. 

The  detected  transient  was  segmented  into  a  128  S2unple  sub-record  by 
including  the  ten  samples  prior  to  the  leading  edge  of  the  detected  transient 
and  enough  samples  after  that  point  to  total  128  points. 

3.3.2  Feature  Extraction  and  Classification  Algorithm 

The  feature  extraction  algorithm  consisted  of  a  continuous  over-sampled 
wavelet  transform  based  on  the  Daubechies  length  8  (genus  4)  wavelet.  Six 
levels  of  transform  were  appUed,  resulting  in  an  octave  band  analysis  of  the 
input  data  consisting  of  seven  distinct  bands  (one  low  pass  and  six  high 
pass  octaves).  This  analysis  bank  is  depicted  in  Figure  3.33,  where,  for  the 
purpose  of  comparison  two  signals  each  from  the  first  two  groups  were  fed 
through  the  filtering  structure. 
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Shift  Invariance  and  oversampling 

The  goal  of  the  classifier  portion  of  the  contract  was  to  design  an  algorithm 
which  was  capable  of  distinguishing  2tmong  the  seven  groups  of  example  .  ig- 
naJs  provided  for  the  study.  Examination  of  the  signals  and  the  results  of 
the  previous  experiment  suggested  several  important  simphfying  assumptions 
and  discriminating  features  of  this  particular  class  of  signals: 

1.  The  transients  as  extracted  by  the  detection  algorithm  are  highly  non¬ 
stationary. 

2.  Reliably  grouping  known  classes  requires  an  insensitivity  to  the  time 
alignment  of  the  transient  within  the  selected  time  window. 

3.  Differences  between  classes  tend  to  be  small  in  terms  of  spectral  or 
auto-correlative  measures. 

In  order  to  both  aiccomodate  and  exploit  these  characteristics  the  clas¬ 
sification  algorithm  was  designed  to  incorporate  a  distance  measure  which 
would  be;  shift  invariant  (and  therefore  unprejudiced  to  the  inital  alignment 
of  the  transients),  finely  grained  in  the  time  dimension  (over- sampled),  and 
based  on  coherent  differences  between  instances  of  the  transients. 

Expanding  on  ideas  originated  by  Mallat  [4]  on  the  use  of  octave  band 
filtering  in  singularity  classification,  we  chose  a  reduced  representation  of  the 
input  signal  consisting  of  local  extrema  within  each  subband.  Thus  each 
subband  output  in  the  filter  tree  is  replaced  by  an  impulsive  signal  which  is 
only  nonzero  at  the  locations  where  the  subband  had  an  extreme  value  and 
is  equal  to  the  original  subband  at  those  locations.  In  this  representation 
it  is  easy  to  see  the  similarities  between  the  reduced  representation  for  each 
pair  of  signals  and  the  differences  between  them.  The  initial  representation 
for  the  signal  is  therefore  a  list  of  extrema  for  each  subband  along  with  the 
time  location  for  each  peak: 

Transient  Feature  Vector  = 

{Fi  =  (B(F)i,X(F)i,V(F)i)  1  1  <  i  <  number  of  peaks} 

where 

B(F)i  =  Band  number  for  peak 

X{F)i  =  Temporal  location  for  peak  #1. 

V{F)i  —  The  value  of  peak  #t. 
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Figure  3.34  illustrates  this  reduced  representation  for  a  subset  of  example 
signals. 

In  order  to  further  reduce  to  dimensionality  of  the  signal  representation, 
these  peaks  were  pruned  to  only  the  six  largest  of  the  set.  Thus  the  final 
representation  of  the  signal  prior  to  the  computation  of  a  pairwise  distance 
function  is  a  list  of  six  peak  values,  their  band  numbers  and  their  locations 
in  time. 

The  first  step  in  computing  a  pairwise  distance  function  for  these  lists 
of  peaks  is  to  perform  an  alignment  of  the  lists.  By  this  we  mean  a  revised 
list  of  peaks  for  each  transient  such  that  the  number  of  peaks  within  each 
band  is  the  same  in  both  lists  (each  is  both  the  same)  £md  time  ordering  is 
preserved  within  each  band.  Such  an  alignment  provides  an  initial  indication 
of  the  similarity  of  two  transients  (whether  the  distibution  of  peaks  among 
the  bands  are  similar  or  not)  and  arranges  the  data  conveniently  for  later 
analysis  stages.  Once  the  alignment  is  accomplished,  peaks  in  one  list  are 
each  naturally  associated  with  the  peak  in  the  corresponding  location  in  the 
other  list. 

The  first  method  used  to  perform  this  alignment  was  based  on  excision. 
Peaks  in  each  band  were  removed  until  the  number  of  peaks  within  each  band 
was  the  same  for  both  lists.  In  each  band,  an  interval  of  adjacent  (sequential) 
peaks  was  chosen  from  the  list  with  the  greater  number  of  peaks,  and  these 
were  associated  pairwise  with  the  peaks  in  the  list  with  fewer  peaks.  The 
remaining  peaks  in  the  list  were  removed  from  the  calculation.  The  interval 
was  chosen  by  minimizing  the  Euclidean  distance  between  a  peaks  forming 
a  given  interval  and  the  peaks  in  the  other  list.  The  temporal  location  of 
peaks  was  not  used  except  to  provide  ordering  of  the  peaks  within  a  bamd. 
The  distance  assigned  to  the  pair  of  hsts  is  then  the  sum  over  the  Euclidean 
distances  in  each  band.  Thus 


=  (  E  (3.1) 

(ij)ematched  peaks 

This  scheme  is  in  some  sense  a  worst  ceise  distance  function.  It  uses  a 
minimal  amount  of  information  (a  maximum  of  6  values,  the  peak  values 
themselves,  are  used  to  identify  a  signal),  and  provides  for  no  penalty  for 
the  peaks  which  do  not  match  in  the  chosen  alignment;  these  were  simply 
removed  from  the  calulation. 
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The  results  of  this  algorithm  in  separating  the  various  transients  into 
their  respective  groups  are  shown  in  Figure  3.3.2.  It  is  easily  seen  that  even 
this  simple  approach  succeeds  in  placing  related  transients  at  small  distances 
from  one  another  (within  a  group,  all  pairs  should  be  enclosed  by  a  single 
contour),  and  is  also  effective  at  separating  the  various  groups  from  one 
another.  However,  the  decision  levels  that  arise  from  this  algorithm  are  not 
very  well  separated,  i.e.,  the  distances  between  groups  are  of  the  same  order 
of  magnitude  as  those  within  groups. 

The  second  method  attempted  to  improve  on  this  situation  by  incorporat¬ 
ing  all  of  the  data  from  the  pruned  peak  hsts.  This  method  uses  the  temporal 
location  information  to  improve  the  match  and  computes  a  distance  function 
which  weights  the  contribution  of  a  given  peak  to  peak  distance  by  the  dis¬ 
tance  between  the  temporal  locations  of  the  peaks.  For  this  purpose,  the 
mean  of  the  temporal  data  is  removed,  thereby  making  these  distances  into 
relative  delays  between  the  various  peaks. 

The  other  improvement  over  method  1  is  to  insert  peaks  in  a  list  with 
fewer  peaks  in  a  given  band,  rather  than  removing  them  from  the  list  with  the 
greater  number.  This  has  the  effect  of  adding  a  penalty  to  the  distance  func¬ 
tion  whenever  a  peak  of  significant  magnitude  in  one  list  cannot  be  matched 
with  a  corresponding  peak  in  the  other  list. 

The  alignment  technique  for  this  method  is  essentidly  the  same  as  for  the 
first  method,  except  that  peaks  of  zero  magnitude  are  inserted  when  there  is 
a  mismatch,  as  opposed  to  the  excision  exercised  by  method  1.  The  distance 
function  is  then 


<'2(F.,Fj)  =  (  ■£  (3.2) 

(ij)gmatched  peaks 

The  results  of  the  second  algorithm  in  separating  the  various  transients 
into  their  respective  groups  are  shown  in  Figure  3.3.2.  This  approach  is  very 
successful  in  placing  related  transients  at  small  distances  from  one  another 
(within  a  group,  all  pairs  should  be  enclosed  by  a  single  contour),  and  is 
also  effective  at  separating  the  various  groups  from  one  another.  The  perfor¬ 
mance  in  this  situation  is  perfect  if  one  considers  that  signal  10  seems  to  be 
misgrouped. 
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Figure  3.43:  Local  extrema  for  transient  9 
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Figure  3.44:  Local  extrema  for  transient  10 


Figure  3.45:  Local  extrema  for  transient  11 
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Figure  3.46:  Local  extrema  for  transient  12 
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Figure  3.47:  Local  extrema  for  transient  13 
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Figure  3.48:  Local  extrema  for  transient  14 
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Figure  3.49:  Local  extrema  for  transient  15 
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Figure  3.50:  Local  extrema  for  transient  16 


Figure  3.51:  Local  extrema  for  transient  17 


Figure  3,52:  Local  extrema  for  transient  18 


Figure  3.53:  Local  extrema  for  transient  19 
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Figure  3.54:  Local  extrema  for  transient  20 


Figure  3.57;  Local  extrema  for  transient  23 
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Figure  3.58:  Local  extrema  for  transient  24 


Chapter  4 

Conclusion  and 
Recommendations 


This  project  demonstrated  the  feasibility  and  utility  of  employing  wavelet 
transform  bcised  methods  in  the  transient  signal  detection  and  feature  extrac¬ 
tion  problem.  Wavelet  based  transient  signal  ainalysis  methods  were  shown 
to  Lave  the  following  desirable  properties; 

•  Robust  transient  detection  in  the  presence  of  strong  sinusoidal  noise 
components; 

•  Compact  signal  representation  that  allows  a  simple  (low  dimensional) 
classifier  design  to  achieve  near  perfect  signal  separation;  and 

•  Low  computational  complexity. 

The  logical  ‘next  steps’  required  to  develop  this  technology  are: 

1.  Continue  development  on  a  larger  class  of  transient  signals.  A  ques¬ 
tion  of  particular  importance  is  how  the  dimensionahty  of  the  classifier 
grows  as  a  function  of  the  size  of  the  problem  under  consideration. 

2.  Development  of  adaptive  versions  of  these  signal  processing  algorithms 
would  enable  the  fielding  of  signal  amalysis  systems  that  could  react  to 
the  observed  environment. 

3.  Development  of  a  high  speed  implementation  of  this  technology  would 
adlow  rapid  progress  in  the  research  arena  as  well  as  enabling  the  tech¬ 
nology  to  be  demonstrated  on  a  meaningful  problem  ‘in  the  field.’ 
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4.  Development  of  hybrid  clcissification  systems  that  mate  wavelet  feature 
extraction  with  neural  networks  or  other  state-of-the-axt  classifier  de¬ 
signs.  There  are  many  open  questions  regarding  the  requirements  for 
the  feature  extraction  and  classifier  portions  of  these  high-performance 
hybrids. 


Appendix  A 
List  of  Symbols 


D  :=  Ring  of  dyadic  rational  numbers 
:=  {x  ;  X  =  m/2"  ,  m,n  €  Z  ,n  >  0} 

R  :=  Field  of  real  numbers. 

Z  :=  Ring  of  rational  integers. 

/i  :=  Rank  for  wavelet  system.  ^  €  Z  and  fi  >2. 

[a]  :=  Wavelet  Coefficient  Matrix  (“WCM”). 

ao"'*  a?  ••• 

where  N  is  an  integer  multiple  of  n  {N  =  gfi  where  g  is  called  the  genus). 
If  /i  =  2  then  we  write: 

[a]  :=  Wavelet  Coefficient  Matrix  (“WCM”). 

(Uq  ■  Q/v-i 

bo  bi  ...  6w_i 

N  :=  Number  of  columns  in  a  WCM.  N  is  an  integer  multiple  of  g.  i.e.  gfx. 

U  fi  =  2  then  N  is  eve 
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Appendix  B 

Wavelet  Transform  Theory 


B.l  Wavelets  and  Wavelet  Transforms 

Most  signals  in  science  and  engineering  are  modeled  as  mathematical  func¬ 
tions  for  purposes  of  analysis.  In  order  to  separate  or  examine  certain  impor¬ 
tant  features  or  characteristics  of  the  signal,  the  function  is  often  expanded 
in  terms  of  basis  functions  that  span  the  space  or  a  subspace  that  the  sig¬ 
nals  of  interest  reside  in.  The  most  common  example  of  this  is  the  Fourier 
transform  where  a  signal  that  originates  in  the  time  domain  is  reformulated 
in  the  frequency  domaiin  by  expanding  the  function  in  terms  of  trigonometric 
or  complex  exponential  basis  functions.  This  basis  is  most  appropriate  when 
the  signals  have  periodic  components  or  are  produced  by  systems  that  are 
modeled  by  constant  coefficient  differential  or  difference  equations. 

Consider  a  periodic,  possibly  complex- valued,  signal  g{t)  that  is  square 
integrable  over  the  range  {0  <  <  <  1}  and  with  period  one  so  that 

,(0  =  <)((  +  l).  (B.l) 

This  function  can  be  expanded  in  a  Fourier  series  of  the  form 

!7(()=  E  6ne"’"'  (B.2) 

n=— oo 

with  the  coefficients  given  by 

K=  (B.3) 

Jo 
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which  is  an  inner  product  of  g{t)  with  the  basis  functions.  Similarly,  one 
defines  the  wavelet  transform  with  respect  to  a  basis  of  wavelet  functions. 

The  wave'et  basis  is  generated  by  a.  n  x  N  matrix  [a]  ,  where  fi  and  N 
are  positive  integers  and  iV  is  a  multiple  of  fi.  The  multiplier  g  is  called  the 
genus  of  the  system  and  N  =  gfi.  The  matrix 


[a]  := 


V  flo"' 


aN-\  J 


(B.4) 


is  a  wavelet  coefficient  matrix  (“WCM”)  if  it  satisfies  the  scaling  conditions 


N-l 


k=0 


(B.5) 


J2a\  =  fi6o,i  (B.6) 

k 

where  equals  1  if  i  =  j  and  0  otherwise.  The  overbar  denotes  complex 
conjugation  and  /  is  an  integer.  The  sums  over  k  are  finite  sums  since  only 
finitely  many  of  the  numbers  a]^  are  different  from  zero. 

The  positive  integer  g,  is  called  the  multiplier  of  the  wavelet  system  and 
N  is  called  its  length; 

This  matrix  of  numbers  provides  coeHcients  for  the  vector  of  recursions 

=  E  -  *)  (B-7) 

fc=0 

which  implicitly  define  the  wavelet  scaling  function  <^°[a]  and  explicitly  define 
the  basic  wavelet  functions  (^’[a](a:),  1  <  i  <  p.  Observe  that  only  (^^[a] 

appears  on  the  right  hand  side.  The  functions  v?‘[a](z)  are  defined  for  all  real 
numbers  i  €  R. 

The  fundamental  fact  about  systems  of  compactly  supported  wavelets  is 
that  the  collection  of  functions 

Basis[a]  :=  ~  • 
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0  <i  <  fi,  j,k,E  Z} 

form  a  basis  for  spaces,  in  particular  for  L^(R). 

We  now  focus  on  the  case  of  /i  =  2  and  will  use  the  following  simplified 
notation  for  the  scaling  function  =  (^®[a](t)  and  the  wavelet  function 
rp{i)  =  where  t  €  R.  It  can  be  shown  that  if  the  coefficients  of  this 

equation  satisfy  the  wavelet  conditions,  stated  above,  the  solution  (p{t)  will 
be  orthogonal  to  integer  translates  of  itself  and  can  be  normalized  such  that 

<  -k)>t  j  (^(t)  -k)dt  =  6o,k  (B.8) 

This  means  the  set  of  basis  functions 

=  ip{t  -  1)  (B.9) 

spans  a  subspace  Vo  in  and  the  coefficients  of  an  expansion  within  this 
subspace  can  be  calculated  as  simple  inner  products.  The  feature  of  seeding 
functions  that  makes  them  attractive  for  signal  processing  is  their  ability  to 
model  signal  properties  that  are  related  to  the  independent  variable  t.  One 
can  increase  the  size  of  the  subspace  spanned  by  the  scaling  functions  by 
using  '=  —  k)  which  spans  a  subspace  Vj.  One  can  show 

that  Vo  C  Vi  C  V2  C  •  •  •. 

The  features  of  a  signal  ciin  often  be  better  described  by  defining  a  slightly 
different  set  of  orthogonal  basis  functions  that  span  the  differences  between 
the  spaces  spanned  by  the  various  scales  of  the  scaling  function.  These  new 
functions  are  the  wavelets.  The  basic  wavelet  is  defined  in  terms  of  the  scaling 
function  by 

Hi)  =  ^{2t  -  k)  .  (B.IO) 

k 

It  is  the  prototype  of  a  class  of  orthonormal  basis  functions  of  the  form 

=  (B.ll) 

where  2-'  is  the  scaling  of  t.  2~^k  is  the  translation  in  t,  and  2^1^  maintains 
the  unity  norm  of  the  wavelet.  We  shall  say  that  j  is  the  base-2  logarithm 
of  the  scale.  If  Wj  is  the  subspace  of  L*  =  T*(R)  spanned  by  the  integer 
translates  of  the  wavelet  2^^^tl^{2H),  additional  disjoint  subspaces  are  spanned 
by  integer  translates  of  the  wavelets  for  each  different  scale  index  j,  that  is. 
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by  the  functions  The  relationship  of  the  various  subspaces  can  be 

seen  from  the  following  expressions: 

Vo  C  Vi  C  V2  C  ■  •  •  C  L"  =  L"(R) ,  (B.12) 

Vo  ©  Wo  =  V,  ,  (B.13) 

V^_i®W,_,  =  V^,  (B.14) 

=  Vo  ©  Wo  ©  Wi  ©  •  •  •  (B.15) 

and  indeed,  if  we  allow  j  to  run  over  all  integers,  then 

=  ...W_2®W_,®Wo®Wx...Wj...  .  (B.16) 

This  states  that  the  set  of  basis  functions  formed  from  and  span 

all  of  and,  therefore,  any  function  in  can  be  written 

00  00  00 

9{t)=  51  c,  9,(0 +  53  51  (^^-17) 

l=  —  00  j=:0it=— 00 

with  the  coefficients  expressed  by 

ct  =  j g{t)  (pi{t)  dt  (B.18) 

and 

<ij,k  =  J 9{t)  'PjAi)  dt.  (B.19) 

The  basis  functions  9,(0  and  fpjkil)  are  numericeil  valued  functions  of  nu¬ 
merical  variables;  they  have  no  physical  dimension.  In  the  expansion  formula 
(B.17)  the  argument  2H  —  A:  of  V’  is  a  pure  number  so,  writing 

-i) 

the  quantity  2-'  has  the  dimension  i.e.  frequency,  and  k/2^  has  the 
dimension  time. 

These  wavelet  coefficients  completely  and  uniquely  describe  the  original 
signal  and  can  be  used  to  represent  it  in  a  way  similar  to  Fourier  coefficients. 
Because  of  the  orthonormality  of  the  basis  functions,  there  is  a  version  of 
P^seval’s  theorem  that  relates  the  energy  of  the  signal  g(t)  to  the  energy  in 


-k  =  y 
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each  of  the  wavelet  expansion  components  and  their  wavelet  coefficients  by 
the  formula 

ii«ii"=EM“+i:  f:  fcr  (b.2o) 

I  j>0  k=-oo 

This  is  one  reason  why  the  orthonormafity  is  so  importaint.  Daubechies  [2] 
showed  that  the  translates  of  the  scaling  function  and  the  translated  dilations 
of  the  wavelets  are  orthonormal,  and  all  of  the  these  functions  have  compact 
support  (i.e.  are  non-zero  only  over  a  finite  region)  if  there  are  only  a  finite 
number  of  non-zero  coeflScients  at  in  the  recursive  scaling  equation  (B.7). 
This  provides  the  time  locadization  that  is  particularly  desirable  for  analyzing 
both  the  time  and  the  frequency  behavior  of  transient  signals. 

Note  that  there  is  an  infinite  set  of  scaling  functions  and  wavelets  that 
can  be  obtained  by  choosing  different  coefficients  Ok  in  (B.7). 


B.2  Energy  and  Parseval’s  Theorem 

Orthonormal  basis  systems  allow  direct  calculation  and  interpretation  of  the 
energy  in  a  signal  partitioned  in  both  the  time  and  the  expansion  domains. 
Parseval’s  theorem  for  the  Fourier  series  (B.2)  states 

n=-oo 

The  “power”  in  a  signal  is  proportional  to  the  square  of  the  signal  (e.g. 
voltage,  current,  force,  or  velocity)  and,  therefore,  the  energy  is  given  by 
the  integral  of  the  square  of  the  signal  magnitude.  Parseval’s  theorem  states 
how  the  total  energy  is  partitioned  in  the  frequency  domain  in  terms  of  the 
partition  provided  by  the  the  orthonormal  basis  functions.  For  the  general 
wavelet  expansion  of  (B.17),  Parseval’s  theorem  is 

E  kl’+E  E  (b.22) 

®  /=— oo  j=Ok=-oo 

with  the  energy  in  the  expansion  domain  partitioned  in  time  by  /  and  k  and 
in  scale  by  j.  For  the  case  of  periodic  functions,  the  relationship  reduces  to 

j=0  k=l 


(B.23) 
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One  can  show  that  \^{f)\  — +  0  as  /  — >  0  and  also  as  /  — >  oo.  Therefore, 
there  will  be  a  band  of  frequencies  where  most  of  the  energy  in  t^(/)  is 
concentrated.  Likewise,  for  many  signals,  the  energy  will  be  concentrated 
in  a  region  of  the  ( j,  k)  plane.  Because  of  this  concentration,  the  energy  of 
the  signal  g{t)  at  frequency  n  and  at  scale  j  and  time  k  is  approximately 
measured  by 

K*P  (B.24) 

If  most  of  the  energy  in  ipif)  occurs  around  frequency  /o,  then  /o  =  n/2-’ 
relates  the  dominant  Fourier  frequency  /o  to  the  dominant  wavelet  scale  j. 
Scale  and  frequency  are  independent  primitive  concepts,  but  the  selection 
of  a  wavelet  basis  estabUshes  a  connection  between  them,  and  the  results  of 
this  chapter  allow  one  to  move  between  the  two  descriptions,  using  the  one 
most  appropriate  for  a  particular  problem.  The  simple  partitioning  of  the 
energy  content  of  a  signal  among  frequencies  has  been  generalized  to  include 
the  parameters  of  time  and  scale.  In  addition,  we  have  at  our  disposal  the 
choice  of  wavelet  systems,  which  is  controlled  by  the  choice  of  in  (B.7), 
and  determines  the  detailed  nature  of  the  relationship  between  frequency  and 
scale. 
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